The University of Sheffield System at TAC KBP 2010
نویسندگان
چکیده
This paper describes the University of Sheffield’s entry in the 2010 TAC KBP entity linking and slot filling tasks. This was our first participation in the TAC KBP track. Given limited human resources and a relatively late decision to participate 1, we chose to view our participation as an exploratory effort, aimed at educating us in the issues surrounding the tasks. With that perspective we decided to first adopt a fairly naive approach, see where it went wrong, then refine the approach as time permitted. Our first “naive” approach to the entity linking (EL) task was to build a text collection from the textual description portion of the KB nodes, index this collection using a search engine tool, convert the EL query into a search engine query and return the top ranked KB node whose name matched the entity name in the query as the answer to the query, provided the similarity score between the query and the KB node exceeded some threshold. Analysis of the failures of this approach suggested that a major problem was the insistence that a KB node name must match the query entity name exactly. The rest of our effort on the EL task went into exploring how we could relax this assumption. Our first “naive” approach to the slot filling (SF) task was to treat it as a relation extraction task, which we tackled with a rule-based approach, given the shortage of training data and our limited development time and resource. We observed that the majority of slot values were one of the entity types person, organization, GPE or timex. We therefore chose to run a named entity recognition and classification (NERC) component that identified these entity types over the top ranked texts retrieved from the test corpus (which had been indexed previously by our search engine tool) using a query derived from the SF query name and associated document. For each slot a set of manually developed rules were applied to sentences containing the query entity name and another entity whose type indicated it was a candidate value for that slot. Candidate entities matched by the rules were returned as slot values. After implementing this approach little time was left for refinement. What limited time we had was spent analyzing what value of n should be chosen in selecting the top n documents returned in the retrieval stage for subsequent slot extraction. The rest of this paper describes our approach and related investigations in more detail. Section 2 briefly describes existing language processing tools which we took “off-the-shelf”, to reduce our development time and to allow us to concentrate on the most interesting aspects of the tasks. Sections 3 and 4 describe in detail our approaches to the EL and SF tasks respectively. Section 5 concludes the paper and discusses potential future work.
منابع مشابه
Saarland University Spoken Language Systems at the Slot Filling Task of TAC KBP 2010
For the slot filling task of TAC KBP 2010 we developed as a system a simple pipeline architecture whose main components are a two-stage retrieval module and a relation extraction module. We use word-cluster features in the system as a method of achieving generalization by exploiting raw text. In the relation extraction module we use distant supervision in order to extract training examples from...
متن کاملContext-Based Entity Linking - University of Amsterdam at TAC 2012
This paper describes our approach to the 2012 Text Analysis Conference (TAC) Knowledge Base Population (KBP) entity linking track. For this task, we turn to a state-of-the-art system for entity linking in microblog posts. Compared to the little context microblog posts provide, the documents in the TAC KBP track provide context of greater length and of a less noisy nature. In this paper, we adap...
متن کاملWikipedia and the Web of Confusable Entities: Experience from Entity Linking Query Creation for TAC 2009 Knowledge Base Population
The Text Analysis Conference (TAC) is a series of Natural Language Processing evaluation workshops organized by the National Institute of Standards and Technology. The Knowledge Base Population (KBP) track at TAC 2009, a hybrid descendant of the TREC Question Answering track and the Automated Content Extraction (ACE) evaluation program, is designed to support development of systems that are cap...
متن کاملUCD IIRG at TAC 2010 KBP Slot Filling Task
This paper describes the IIRG’s first implementation of a system for automatic Knowledge Base Population (KBP). The Text Analysis Conference (TAC), first organised by NIST in 2008, promotes further research in Natural Language Technologies. In 2009, NIST added a Knowledge Base Population Track to TAC, the goal of this track was to promote research in to the automatic population of knowledge bases.
متن کاملTCAR at TAC-KBP-2010
The TCAR team developed multiple information retrieval based entity linking systems in a matter of weeks for the TAC-KBP evaluation task. We focused primarily on developing entity linking algorithms that do not require Wikipedia text and correctly detect when a given entity does not exist in Wikipedia (NIL). Without using Wikipedia text, the system achieves an overall TAC 2010 score of 67 perce...
متن کاملUBC at Slot Filling TAC-KBP 2011
This paper describes our submissions for the Slot Filling task of TAC-KBP 2011. The system takes as baseline the one we developed for the 2010 edition (Intxaurrondo et al., 2010), which is based on distant supervision. We did a straightforward implementation, trained using snippets of the document collection containing both entity and filler from the KB provided by the organizers. Our system do...
متن کامل